Automating Biomedical Data Science Through Tree-Based Pipeline Optimization

نویسندگان

  • Randal S. Olson
  • Ryan J. Urbanowicz
  • Peter C. Andrews
  • Nicole A. Lavender
  • La Creis Kidd
  • Jason H. Moore
چکیده

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Optimization Model for Scheduling of a Multi-Product Tree-Like Pipeline

In the petroleum supply chain, oil refined products are often delivered to distribution centers by pipelines since they provide the most reliable and economical mode of transportation over large distances. This paper addresses the optimal scheduling of a complex pipeline network with multiple branching lines. The main challenge is to find the optimal sequence and time of product injections/deli...

متن کامل

TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning

As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, flexible, and scalable. In response to this demand, automated machine learning (AutoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT v0.3, an open source genetic p...

متن کامل

OPTIMIZATION OF TREE-STRUCTURED GAS DISTRIBUTION NETWORK USING ANT COLONY OPTIMIZATION: A CASE STUDY

An Ant Colony Optimization (ACO) algorithm is proposed for optimal tree-structured natural gas distribution network. Design of pipelines, facilities, and equipment systems are necessary tasks to configure an optimal natural gas network. A mixed integer programming model is formulated to minimize the total cost in the network. The aim is to optimize pipe diameter sizes so that the location-alloc...

متن کامل

Detailed Scheduling of Tree-like Pipeline Networks with Multiple Refineries

In the oil supply chain, the refined petroleum products are transported by various transportation modes, such as rail, road, vessel and pipeline. The latter provides one of the safest and cheapest ways to connect production areas to local markets. This paper addresses the operational scheduling of a multi-product tree-like pipeline connecting several refineries to multiple distribution centers ...

متن کامل

Discovery Informatics in Biological and Biomedical Sciences: Research Challenges and Opportunities

New discoveries in biological, biomedical and health sciences are increasingly being driven by our ability to acquire, share, integrate and analyze, and construct and simulate predictive models of biological systems. While much attention has focused on automating routine aspects of management and analysis of "big data", realizing the full potential of "big data" to accelerate discovery calls fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016